Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings
نویسندگان
چکیده
Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person’s visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a ‘Selective Gaze-contingent ASR’ (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level a ‘gaze-Lombard effect’ simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR.
منابع مشابه
Selective use of gaze information to improve ASR performance in noisy environments by cache-based class language model adaptation
Using information from a person’s gaze has potential to improve ASR performance in acoustically noisy environments. However, previous work has resulted in relatively minor improvements. A cache-based language model adaptation framework is presented where the cache contains a sequence of gaze events, classes represent visual context and task, and the relative importance of gaze events is conside...
متن کاملReduced complexity equalization of lombard effect for speech recognition in noisy adverse environments
In real-world adverse environments, speech signal corruption by background noise, microphone channel variations, and speech production adjustments introduced by speakers in an effort to communicate efficiently over noise (Lombard effect) severely impact automatic speech recognition (ASR) performance. Recently, a set of unsupervised techniques reducing ASR sensitivity to these sources of distort...
متن کاملThe selective use of gaze in automatic speech recognition
The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers’ vocal and non-voc...
متن کاملICA 2010 paper
The automatic speech recognition (ASR) under noisy environments is focused as one of the challenging topics. Especially, the observed speech under noisy environments much distorts compared with neutral observed speech under quiet one. This distortion is called Lombard effects, and ASR performance degrades by them. They should strongly occur subject to no auditory feedback for speaker. In conven...
متن کاملOnline Noise and Lombard Effect Compensation for In-Vehicle Automatic Speech Recognition
Presence of background noise in speech impacts the performance of automatic speech recognition (ASR). Adverse noisy environments are also known to induce so-called Lombard effect (LE), where speakers adjust their speech production in order to preserve intelligible communication. LE leads to further ASR degradation, often stronger than the one due to noise. Recently, a set of techniques reducing...
متن کامل